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S-h ■ Abstract 

| Motivated by the pervasiveness of strong inapproximability results for Max-CSPs, we introduce a 

relaxed notion of an approximate solution of a Max-CSP. In this relaxed version, loosely speaking, 
the algorithm is allowed to replace the constraints of an instance by some other (possibly real-valued) 
constraints, and then only needs to satisfy as many of the new constraints as possible. 

To be more precise, we introduce the following notion of a predicate P being useful for a (real- 
valued) objective Q: given an almost satisfiable Max-P instance, there is an algorithm that beats a 
random assignment on the corresponding Max-Q instance applied to the same sets of literals. The 
standard notion of a nontrivial approximation algorithm for a Max-CSP with predicate P is exactly the 
O ' same as saying that P is useful for P itself. 

We say that P is useless if it is not useful for any Q. This turns out to be equivalent to the fol- 
lowing pseudo-randomness property: given an almost satisfiable instance of Max-P it is hard to find an 
assignment such that the induced distribution on fc-bit strings defined by the instance is not essentially 
uniform. 

Under the Unique Games Conjecture, we give a complete and simple characterization of useful Max- 
CSPs defined by a predicate: such a Max-CSP is useless if and only if there is a pairwise independent 
distribution supported on the satisfying assignments of the predicate. It is natural to also consider the case 
when no negations are allowed in the CSP instance, and we derive a similar complete characterization 
, (under the UGC) there as well. 

Finally, we also include some results and examples shedding additional light on the approximability 
of certain Max-CSPs. 
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1 Introduction 



The motivation for this paper comes from the study of maximum constraint satisfaction problems (Max- 
CSPs). We are given a sequence of constraints, each depending on a constant number of variables, and 
the goal is to find an assignment that maximizes the number of satisfied constraints. Essentially any such 
problem is NP-hard and a large number of papers have studied the question of approximability of this class 
of problems. The standard concept of approximability is that an algorithm is a C-approximation algorithm 
if, on any instance /, it outputs a number A(I) such that C • 0(1) < A(I) < 0(1), where 0(1) is the 
optimum value of I. 

There are finer measures of performance. For example, one can take C above to be a function of the 
optimum value 0(1). That is, for each fraction of consttaints satisfied by the optimal solution, we try to 
determine the best solution that can be found efficiently. The only problem where this has been fully done 
explicitly is Max-Cut, where, assuming the unique games conjecture, O'Donnell and Wu [OW08 ] has found 
the entire curve of approximability. In a remarkable paper, Raghavendra [Rag08] showed that, assuming 
the unique games conjecture, the best such approximation possible is the one given by a certain natural 
semidefinite programming-based (SDP) relaxation of the problem. However, understanding the performance 
of this SDP is difficult in general and in this paper we are interested in more explicit bounds. 

Max-Cut turns out to be approximable in a very strong sense. To describe these results note that for 
Max-Cut it is the case that a random assignment satisfies half of the constraints on average. Whenever the 
optimum satisfies a fraction 1/2 + e of the constraints then it is possible to efficiently find an assignment 
that satisfies a fraction 1/2 + c(e) of the constraints where c(e) is strictly positive, depending on e 1CW 041. 
In other words, whenever the optimal solution satisfies a non-trivial fraction of the constraints it is possible 
to efficiently find an assignment that satisfies a smaller, but still non-trivial, fraction of the constraints. 

In this paper the main focus is on the other end of the spectrum. Specifically we are interested in the 
following property: even if the optimal solution satisfies (almost) all the constraints it is still hard to find an 
assignment that satisfies a non-trivial fraction. This might sound like an unusual property, but evidence is 
mounting that most CSPs have this property. We say that such a CSP is approximation resistant (a formal 
definition appears in Section [2]). 

We shall focus on a special class of CSP defined by a single predicate P : {—1, l} k — > {0, 1} (through- 
out the paper we identify the Boolean value true with —1 and false with 1). Each constraint asserts that P 
applied to some k literals (each literal being either a variable or a negated variable) is true. We refer to this 
problem as Max-P, and say that P is approximation resistant if Max-P is. 

Several predicates are proven to be approximation resistant in [HasOl ] and the most notable cases are 
when the predicate in question is the XOR, or the usual OR, of 3 literals. For the latter case, Max-3Sat, 
it is even the case that the hardness remains the same for satisfiable instances. This is clearly not the case 
for XOR since a satisfying assignment, if one exists, can be found by Gaussian elimination. Hast [Has05] 
studied predicates of arity 4 and of the (exactly) 400 different predicates, 79 are proven to be approximation 
resistant, 275 are found to be non-trivially approximable while the status of the remaining 46 predicates 
was not determined. Some results exist also for larger predicates and we return to some of these results 
in Section 01 If one is willing to believe the unique games conjecture (UGC) of Khot [ Kho02] then it 
was established in II AH 1 1 II that an overwhelming majority of all predicates are approximation resistant. 
This paper relies on a result [AM09 ] establishing that any predicate P such that the set of accepted strings 
P _1 (l) supports apairwise independent distribution is, assuming the UGC, approximation resistant. 

In spite of all these impressive results we want to argue that approximation resistance is not the ultimate 
hardness condition for a CSP. Approximation can be viewed as relaxing the requirements: if there is an 
assignment that satisfies a large number, or almost all, of a given set of constraints, we are content in finding 
an assignment that satisfies a lower but still non-trivial number of constraints. In some situations, instead 
of relaxing the number of constraints we want to satisfy it might make more sense to relax the constraints 
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themselves. 

Sometimes such relaxations are very natural, for instance if considering a threshold predicate we might 
want to lower the threshold in question. It also makes sense to have a real-valued measure of success. If we 
give the full reward for satisfying the original predicate we can have a decreasing reward depending on the 
distance to the closest satisfying assignment. This is clearly natural in the threshold predicate scenario but 
can also make good sense for other predicates. 

It seems like the least we can ask for of a CSP is that when we are given an instance where we can satisfy 
(almost) all constraints under a predicate P then we can find an assignment that does something non-trivial 
for some, possibly real-valued, relaxation Q. This brings us to the key definition of our paper. 

Definition 1.1. The predicate P is useful for the real-valued function Q : {—1, l} k — > M., if and only if 
there is an e > such that given an instance of Max-P where the optimal solution satisfies a fraction 1 — e 
of the constraints, there is a polynomial time algorithm to find an assignment x° such that 

1 m 

Here x° denotes k-bit string giving the values of the k literals in the j 'th constraint of P under the assignment 
x°. 

Given a notion of "useful" it is natural to define "useless". We say that P is useless for Q if, assuming 
P 7^ NP, it is not useful for Q. We choose to build the assumption P ^ NP into the definition in order not to 
have to state it for every theorem - the assumption is in a sense without loss of generality since if P = NP 
then uselessness is the same as a related notion we call information-theoretic uselessness which we briefly 
discuss in Section [3] Note that uselessness is a generalization of approximation resistance as that notion is 
the property that P is useless for P itself. 

This observation implies that requiring a predicate to be useless for any relaxation is a strengthening of 
approximation resistance. The property of being a relaxation of a given predicate is somewhat in the eye of 
the beholder and hence we choose the following definition. 

Definition 1.2. The predicate P is (computationally) useless if and only if it is useless for every Q : 
{-l,l} k -> K. 

As described in Section [4] it turns out that almost all approximation resistance proofs have indeed estab- 
lished uselessness. There is a natural reason that we get this stronger property. In a standard approximation 
resistance proof we design a probabilistically checkable proof (PCP) system where the acceptance criteria 
is given by P and the interesting step in the proof is to analyze the soundness of this PCP. In this analysis 
we use the Fourier expansion of P and it is proved that the only term that gives a significant contribution 
is the constant term. The fact that we are looking at the same P that was used to define the PCP is usually 
of no importance. It is thus equally easy to analyze what happens to any real-valued Q. In particular, it is 
straightforward to observe that the proof of [HasOl] in fact establishes that parity of size at least 3 is useless. 
Similarly the proof of [AM09 ], showing that any predicate that supports a pairwise independent measure 
is approximation resistant, also gives uselessness (but of course we still need to assume the unique games 
conjecture). 

The possibly surprising, but technically not very difficult, result that we go on to establish (in Section [5]> 
is that if the condition of [AM09] is violated then we can find a real- valued function for which P is useful. 
Thus assuming the UGC we have a complete characterization of the property of being useless! 

Theorem 1.3. Assuming the UGC, a predicate P is (computationally) useless if and only if there is a 
pairwise independent distribution supported on P _1 (l). 



2 



Without negated variables We then go on in Section [6]to briefly discuss what happens in the case when 
we do not allow negated variables, which in some cases may be more natural. In this situation we need to 
extend the notion of a trivial algorithm in that now it might make sense to give random but biased values to 
the variables. A simple example is when P accepts the all-one string in which case setting each variable to 
1 with probability 1 causes us to satisfy all constraints (regardless of the instance), but probabilities strictly 
between and 1 might also be optimal. Taking this into account our definitions extend. 

In the setting without negated variables it turns out that the unique games-based uselessness proof can 
be extended with slightly relaxed conditions with minor modifications. We are still interested in a measure, 
/i, supported on strings accepted by P but we can allow two relaxations. The individual bits under fj, need 
not be unbiased but each bit should have the same bias. Perhaps more interestingly, the bits need not be 
pairwise independent and we can allow positive (but for each pair of bits the same) correlations among the 
bits. 

Theorem 1.4 (Informal). When we do not allow negated variables, P is useless (assuming UGC) if and 
only if the accepting strings of P supports such a distribution. 

Note that this implies that any predicate that is useless when we allow negations is also useless when we 
do not allow negations while the converse is not true. 

A basic computationally useless predicate in this setting is odd parity of an even number of variables 
(at least 4 variables). With even parity, or with odd parity of an odd number of variables, the predicate is 
also useless, but for the trivial reason that we can always satisfy all constraints (so the guarantee that we can 
satisfy most applications of P gives no extra information). Surprisingly we need the UGC to establish the 
result for odd parity of an even number of variables. As briefly discussed in Section [6] below it seems like 
new techniques are needed to establish NP-hardness results in this situation. 

Adaptive uselessness and pseudorandomness Our definition of uselessness is not the only possible 
choice. A stronger definition would be to let the algorithm choose a new objective Q based on the ac- 
tual Max-P instance /, rather than just based on P. We refer to this as adaptive uselessness, and discuss 
it in Section |7] It turns out that in the settings discussed, adaptive uselessness is the same as non-adaptive 
uselessness. 

In the adaptive setting when we allow negations clearly the task is to find an assignment such that the 
/c-bit strings appearing in the constraints do not have the uniform distribution. This is the case as we can 
choose a Q which takes large values for the commonly appearing strings. Thus in this situation our results 
say that even given the promise that there is an assignment such that almost all resulting fc-bit strings satisfy 
P, an efficient algorithm is unable to find any assignment for which the distribution on /c-bit strings is not 
(almost) uniform. 

Other results When we come to investigating useful predicates and to determining pairs for which P is 
useful for Q it is of great value to have extensions of the result [AM09 ]. These are along the lines of having 
distributions supported on the strings accepted by P where most pairs of variables are uncorrelated. Details 
of this can be found in Section [87TI Then motivated by the pairwise independence condition we present (in 
Section 18.21 ) a predicate which is the sign of quadratic function but which is still approximation resistant. 
We also take a brief look at the other end of the spectrum, and study CSPs which are highly approximable 
in Section |9l 

A preliminary version of this paper has appeared at the conference for Computational Complexity 

EHH. 
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2 Preliminaries 



We have a predicate P: {— 1, l} k — > {0, 1}. The traditional approximation problem to study is Max-P in 
which an instance consists of m fc-tuples of literals, each literal being either a variable or a negated variable. 
The goal is to find an assignment to the variables so as to maximize the number of resulting k-bit strings that 
satisfy the predicate P. To be more formal an instance is given by a set of indices a* 6 [n] for 1 < i < k and 
1 < j < m and complementations b l - 6 { — 1, 1}- The jth A;-tuple of literals contains the variables (x a i 

with x i negated iff b) = — 1. We use the short hand notation P(x b a J .) for the jth constraint. 

We do not allow several occurrences of the same variable in one constraint. In other words, a 1 - ^ a*- for 
i / i' '. The reason for this convention is that if the same variable appears twice we in fact have a different 
predicate on a smaller number of variables. This different predicate is of course somewhat related to P but 
does not share even basic properties such as the probability that it is satisfied by a random assignment. Thus 
allowing repeated variables would take us into a more complicated situation. 

In this paper we assume that all constraints have the same weight but it is not hard to extend the results 
to the weighted case. 

Definition 2.1. For Q : {-1, l} k -)• R define 

E Q = E \Q{x)\. 
xs{-i,ip 

Note that for a predicate P an alternative definition of Ep is the probability that a uniformly random 
assignment satisfies P. It follows that the trivial algorithm that just picks a uniformly random assignment 
approximates Max-P within a factor Ep. 

Definition 2.2. The predicate P is approximation resistant if and only if, for every e > 0, it is NP-hard to 
approximate Max-P within a factor Ep + e. 

Another way to formulate this definition is that, again for any e > 0, it is NP-hard to distinguish 
instances for which the optimal solution satisfies a fraction 1 — e of the constraints from those where the 
optimal solution only satisfies a fraction Ep + e. One can ask for even more and we have the following 
definition. 

Definition 2.3. The predicate P is approximation resistant on satisfiable instances if and only if, for any 
e > 0, it is NP-hard to to distinguish instances of Max-P for which the optimal solution satisfies all the 
constraints from those instances where the optimal solution only satisfies a fraction Ep + e of the constraints. 

A phenomenon that often appears is that if P is approximation resistant then any predicate P' that 
accepts strictly more strings is also approximation resistant. Let us introduce a concept to capture this fact. 

Definition 2.4. The predicate P is hereditarily approximation resistant if and only if, for any predicate P' 
implied by P (i.e., whenever P{x) is true then so is P'{x)) is approximation resistant. 

It turns out that 3-Lin, and indeed any parity of size at least three, is hereditarily approximation resistant. 
There are also analogous notions for satisfiable instances but as this is not the focus of the present paper we 
do not give the formal definition here. One of the few examples of a predicate that is approximation resistant 
but not hereditarily so is a predicate studied by Guruswami et al [GLST98]. We discuss this predicate in 
more detail in Section l8Tl beiow. 

Let us recall the definition of pairwise independence. 
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Definition 2.5. A distribution p over {—1, l} k is biased pairwise independent if, for some p G [0, 1], we 
have Pr M [xj = 1] = p for every i G [k] and Pr M [xj = 1 A Xj = 1] = p 2 for every 1 < i < j < k (i.e., if all 
two-dimensional marginal distributions are equal and product distributions). 

We say that p is pairwise independent if it is biased pairwise independent with p = 1/2 (i.e., if the 
marginal distribution on any pair of coordinates is uniform). 

Finally we need a new definition of a distribution that we call uniformly positively correlated. 

Definition 2.6. A distribution p over {—1, l} k is uniformly positively correlated if, for some p,p G [0, 1], 
with p > p 2 , we have Pr^[xi = 1] = p for every i 6 [fc] and Pr M [xj = 1 A Xj = 1] = p for every 
^ < i < 3 < k (i.e., if all two-dimensional marginal distributions are equal and the bits are positively 
correlated). 

Note that we allow p = p 2 and thus any biased pairwise independent distribution is uniformly positively 
correlated. 

3 Information-Theoretic Usefulness 

Clearly there must be some relation between P and Q for our notion to be interesting and let us discuss this 
briefly in the case when Q is a predicate. 

If P and Q are not strongly related then it is possible to have instances where we can satisfy all con- 
straints when applying P and only an Eq fraction for Q. A trivial example would be if P is OR of three 
variables and Q is XOR. Then given the two constraints (x±, X2, x%) and (xi,X2,xs) it is easy to satisfy 
both constraints under P but clearly exactly one is always satisfied under Q. Thus we conclude that OR is 
not useful for XOR. 

As another example let P be equality of two bits and Q non-equality and let the constraints be all pairs 
(xi, Xj) for 1 < i < j < n (unnegated). It is possible to satisfy all constraints under P but it is not difficult 
to see that the maximal fraction goes to 1/2 under Q as n tends to infinity. We can note that the situation is 
the same for P being odd parity and Q being even parity if the size is even, while if the size of the parity is 
odd the situation is completely the opposite as negating a good assignment for P gives a good assignment 
for Q. 

After these examples let us take a look in more detail at usefulness in an information-theoretic sense. It 
is not difficult to see that perfect and almost-perfect completeness are equivalent in this situation. 

Definition 3.1. A predicate P is information-theoretically useless for Q if, for any e > there is an instance 
such that 



A trivial remark is that in the information-theoretic setting we cannot have total uselessness as P is 
always information-theoretically useful for itself or any predicate implied by P (unless P is trivial). 

Let us analyze the above definition. Let p be a probability measure and let p p be the distribution obtained 
by first picking a string according to p and then flipping each coordinate with probability p. Note that p need 
not be small and p = 1 is one interesting alternative as illustrated by the parity example above. 

For a given p let Opt(Q, p) be the maximum over p of the expected value of Q(x) when x is chosen 
according to p p . We have the following theorem. 




j'=i 



while 
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Theorem 3.2. The predicate P is information-theoretically useless for Q if and only if there exists a measure 
supported on strings accepted by P such that Opt(Q, p) = Eq. 

Proof. Let us first see that if Opt(Q, p) > Eq for every p then P is indeed useful for Q. Note that the 
space of measures on a finite set is compact and thus we have Opt(Q, p) > Eq + S for some fixed S > 
for any measure p. 

Consider any instance with 



max 

m 
3 



in 

-£p(4) = i 



b, 



and let us consider the strings (x^ )™^ when x is the optimal solution. These are all accepted by P and 
considering with which proportion each string appears we let this define a measure p of strings accepted by 
P. By the definition of Opt(Q, p), there is some p such that a random string from p p gives an expected 
value of at least Eq + 5 for Q(x). It follows that flipping each bit in the optimal assignment for P with 
probability p we get an assignment such that 



E 



m ^— ' J 

3 = 1 



>E Q + 5 



and thus P is information-theoretically useful for Q. 

For the reverse conclusion we construct a randomized instance. Let p the measure guaranteed to exist 
by the assumption of the theorem. We make sure that the all-one solution always gives an optimal value by 
setting each aj to a uniformly random set of indices from [n] which are all different, and setting bj such that 
the resulting string x^ have the distribution given by p. 

Now we claim that, for an assignment with a fraction 1— p variables set to 1, the expected value (over the 
choice of instance) of ^ Sj=i Qi^]) i s within an additive O(^) of E[Q(x)] when x is chosen according 
to p p . This is more or less immediate from the definition and the small error comes form the fact that we 
require the chosen variables to be different creating a small bias. Taking m sufficiently large compared to n 
the theorem now follows from standard large deviation estimates and an application of the union bound. □ 

Let us return to our main interest of studying usefulness in a computational context. 



4 Some Examples and Easy Theorems 

We have an almost immediate consequence of the definitions. 

Theorem 4.1. If P is useless then P is hereditarily approximation resistant. 

Proof. Let P' be any predicate implied by P. The fact that P is useless for P' states that it is hard to 
distinguish instances where we can satisfy P (and hence P') almost always from those where we can only 
satisfy P' on an Ep> fraction of the constraints. The theorem follows. □ 

Clearly we have the similar theorem for satisfiable instances. 

Theorem 4.2. If P is useless on satisfiable instances then P is hereditarily approximation resistant on 
satisfiable instances. 
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The standard way to prove that a predicate P is approximation resistant is to design a Probabilistically 
Checkable Proof (PCP) where the acceptance criterion is given by P and to prove that we have almost 
perfect completeness (i.e., correct proofs of correct statements are accepted with probability 1 — e) and 
soundness Ep + e. Usually it is easy to analyze the completeness and the main difficulty is the soundness. 
In this analysis of soundness, P is expanded using the discrete Fourier transform and the expectation of each 
term is analyzed separately. 

The most robust way of making this analysis is to prove that each non-constant monomial has expecta- 
tion at most e. As any real- valued function can be expanded by the Fourier transform this argument actually 
shows that the predicate in question is computationally useless. To show the principle let us prove the 
following theorem. 

Theorem 4.3. For any k > 3, parity of k variables is computationally useless. 

Proof. To avoid cumbersome notation let us only give the proof in the case k = 3. We assume the reader is 
familiar with the PCP defined for this case in [HasOl ] to prove that Max-3-Lin is approximation resistant. 
We claim that the same instances show that 3-Lin is computationally useless. 

Indeed consider an arbitrary Q : { — 1, l} 3 — > E and consider its Fourier-expansion 

Q(x) = Qsxs(x). (1) 
fire [3] 

Now we need to consider YliLi Q( x a J j ) and we can expand each term using £T|) and switch the the order of 
summation. Remember that = Eq and thus we need to make sure that 

m 

»=i 

is small for any non-empty S (unless there is a good strategy for the provers in the underlying two-prover 
game). This is done in llHasOli for S = {1, 2, 3} as this is the only Fourier coefficient that appears in the 
expansion of parity itself. 

For smaller, non-empty, S, it is easy to see that (O equals 0. Bits read corresponding A(f) and B(gi) 
are pairwise independent and pairing terms for / and — / proves that K[B (gi) B (g2)} = 0. 

The result follows in the case of parity of 3 variables and the extension to the general case is straightfor- 
ward and left to the reader. □ 

As stated above most approximation resistance results turn out to give uselessness without any or only 
minor modifications of the proofs. In particular, if one is looking for sparse useless predicates, the predicates 
by Samorodnitsky and Trevisan [ST00] (accepting 2 2d strings of arity 2d + d 2 ) and of Engebretsen and 
Holmerin [EH08] (accepting 2 d strings of arity d(d — l)/2) are computationally useless. For the former 
result, the proof in [HW03 1 is easier to extend to give computational uselessness. 

Turning to satisfiable instances, for arity 3, the predicate 

[xi = 1) V (aci 0x 2 0x 3 ) 

studied in BHasOU is computationally useless even on satisfiable instances. Turning to sparse predicates 
of larger arity the predicates defined by Hastad and Khot [HK05 ] which accepts 2 4fc inputs and have arity 
4k + k 2 , have the same property. This paper presents two different predicates with these parameters and 
although it is likely that the result holds for both predicates we have only verified this for "the almost disjoint 
sets PCP". If we are willing to assume the unique games conjecture by Khot I K ho02H we can use the results 
of IIAM09B to get very strong results. 
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Theorem 4.4. Let P be a predicate such that the strings accepted by P supports a pairwise independent 
measure. Then, assuming the unique games conjecture, P is computationally useless. 

This follows immediately from the proof of [AM09 ] as the proof shows that the expectation of each 
non-constant character is small. 

As the unique games conjecture has imperfect completeness there is no natural way to use it to prove that 
certain predicates are computationally useless on satisfiable instances. We note, however, that the result of 
O'Donnell and Wu [OW09] that establishes that the 3-ary predicate "not two" is approximation resistant on 
satisfiable instances, based on any d-to- 1 conjecture, establishes the same predicate to be computationally 
useless on satisfiable instances. 



5 The Main Usefulness Result 

In this section we present our main algorithm showing that Theorem l4.4l is best possible in that any predicate 
that does not support a pairwise independent measure is in fact not computationally useless. We have the 
following result which is proved in HAHlll but, as it is natural and not very difficult, we suspect that it is 
not original of that paper. 

Theorem 5.1. Suppose that the set of inputs accepted by predicate P does not support a pairwise inde- 
pendent measure. Then there is a real-valued quadratic polynomial Q such that Q(x) > Eq for any 

xeP-^i). 

Proof Sketch. The full proof appears in BAH1U but let us give a sketch of the proof. For each x G { — 1,1}* 
we can define a point x^ in k + (2) real dimensions where the coordinates are given by the coordinates 
of x as well as any pairwise product of coordinates XiXj. The statement that a set S supports a pairwise 
independent measure is equivalent with the origin being in the convex hull of the points {x^ \ x £ S}. If 
the origin is not in the convex hull of these points then there is a separating hyperplane and this hyperplane 
defines the quadratic function Q. □ 

We now have the following theorem. 

Theorem 5.2. Let P be a predicate whose accepting inputs do not support a pairwise independent measure 
and let Q be the quadratic function proved to exist by Theorem \5.1\ Then P is useful for Q. 

Proof. To make the situation more symmetric let us introduce a variable xq which always takes the value 1 
and replace the linear terms Xi by xqXi and drop any constant term in Q. This makes Q homogeneous of 
degree 2. Note that negating all inputs does not change the value of Q and thus any solution with xo = — 1 
can be transformed to a legitimate solution by negating all variables. As each term is unbiased we have 
Eq = and thus the goal is to find an assignment that gives ^ ^ Q(xaj ) > 5m for some absolute constant 
5. Now let 

C= max — Q(x) c= min Q(x). 

By assumption we have that c and C are fixed constants where c is strictly larger than 0. Let D be the sum 
of the absolute values of all coefficients of Q. 
Let us consider our objective function 



F{x) = £ Q[x% 



i=l 
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which is a quadratic polynomial with the sum of the absolute values of coefficients bounded by Dm. As we 
are guaranteed that we have an assignment that satisfies at least (1 — e)m clauses we know that the optimal 
value of F is at least (1 — e)cm — eCm > cm — (c + C)em. 

Consider the standard semidefinite relaxation where we replace each product XjXj by an inner product 
(vi,Vj) for unit length vectors V{. This semidefinite program can be solved with arbitrary accuracy and let us 
for notational convenience assume that we have an optimal solution which by assumption has an objective 
value at least cm — (c + C)em. 

To round the vector valued solution back to a Boolean valued solution we use the following rounding 
guided by a positive constant B. 

1. Pick a random vector r by picking each coordinate to be an independent normal variable with mean 
and variance 1. 

2. For each i if \(vi, r)\ < B set pi = B+ !f^ and otherwise set pi = \. 

3. Set Xi = 1 with probability p, L independently for each % and otherwise Xj = —1. 

Remember that if xo gets the value — 1 we negate all variables. The lemma below is the key to the analysis. 
Lemma 5.3. We have 

E[XiXj] - -jpivuVj) 

for some absolute constant b. 

Proof. If \(v{,r)\ < B and \(vj, r)\ < B then E[xjXj] = -g^ K r [(vi, r)(vj, r)]. Now it is not difficult to 
see that K r [(vi,r)(vj, r)] = (vi,Vj) and thus using the fact that Pr[\(vi, r)\ > B] < |e~ fi2 / 2 for a suitable 
constant b, the lemma follows. □ 

Taking all the facts together we get that the obtained Boolean solution has expected value at least 

— (cm — (c + C)em) — be~ B l 2 Dm. 
B l 

If we choose e = 2 (c+c) anc ^ tnen ^ a sufficiently large constant we see that this expected value is at least 
5m for some absolute constant 5. The theorem follows. □ 
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6 The Case of No Negation 

In our definition we are currently allowing negation for free. Traditionally this has not been the choice in 
most of the CSP-literature. Allowing negations does make many situations more smooth but both cases are 
of importance and let us here outline what happens in the absence of negation. We call the resulting class 
Max-P+. 

In this case the situation is different and small changes in P may result in large difference in performance 
of "trivial" algorithms. In particular, if P accepts the all-zero or all-one string then it is trivial to satisfy all 
constraints by setting each variable to in the first case and each variable to 1 in the second case. 

We propose to extend the set of trivial algorithms to allow the algorithm to find a bias r G [—1,1] and 
then set all variables randomly with expectation r, independently. The algorithm to outperform is then the 
algorithm with the optimal value of r. Note that this algorithm is still oblivious to the instance as the optimal 
r depends solely on P. We extend the definition of Eq for this setting. 
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Definition 6.1. For Q: {-1, l} k h> R and r e [-1, 1], define 




E 

^£{-1,1} 



k 



Q(x) 




re[-l,i] 



max Eq(r) 



(r) 



where {—1, 1} 



/ -v denotes the r-biased hypercube. 



Using this definition we now get extensions of the definitions of approximation resistance and useless- 
ness of Max-P + , and we say that P is positively approximation resistant or positively useless. 

6.1 Positive usefulness in the information theoretic setting 

The results of Section[3]are not difficult to extend and we only give an outline. The main new component to 
address is the fact that and 1 are not symmetric any longer. 

As before let fi be a probability measure and let pP ,q be the distribution obtained by first picking a string 
according to /i and then flipping each coordinate that is one to a zero with with probability p and each 
coordinate that is zero to one with probability q (of course all independently). For a given fi let Opt + (Q, n) 
be the maximum over p and q of the expected value of Q(x) when x is chosen according to \x v,q . We have 
the following theorem. 

Theorem 6.2. The predicate P is positively information-theoretically useless for Q if and only if there exists 
a measure supported on strings accepted by P such that Opt + (Q, /j.) = Eq. 

Proof. The proof follows the proof of Theorem 13.21 and we leave the easy modifications to the reader. □ 
Let us return to the more interesting case of studying positive uselessness in the computational setting. 

6.2 Positive usefulness in the computational setting 

Also in this situation we can extend the result from the situation allowing negations by using very similar 
techniques. We first extend the hardness result Theorem I4.4l based on pairwise independence to this setting 
and we can now even allow a uniformly positively correlated distribution. 

Theorem 6.3. Let P be a predicate such that the strings accepted by P supports a uniformly positively 
correlated distribution. Then, assuming the unique games conjecture, P is positively useless. 

A similar theorem was noted in [AuslO], but that theorem only applied for pairwise independent distri- 
butions. The relaxed condition that the distribution only needs to be positively correlated is crucial to us as 
it allows us to get a tight characterization. As the proof of Theorem 16. 3 l has much in common with the proof 
of Theorem 18.61 stated below we give the proofs of both theorems in Section [TOl 

Let us turn to establishing the converse of Theorem 16.31 We start by extending Theorem 15.11 

Theorem 6.4. Suppose that the set of inputs accepted by predicate P does not support a uniformly positively 
correlated measure. Then there is a real-valued quadratic polynomial Q such that Q(x) > Eq for any 
x G P _1 (l). Furthermore, Q can be chosen such that the optimal bias r giving the value Eq satisfies 
\r\ < 1. 

Proof. As in the proof of Theorem 15. II for each x £ {—1, l} fc we can define a point x^ in k + ( 2 ) real 
dimensions where the coordinates are given by the coordinates of x as well as any pairwise product of 
coordinates XiXj. We consider two convex bodies, K\ and Ki where K\ is the same body we saw in the 
proof of Theorem 15. II - the convex hull of x^ for all x accepted by P. 
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For each b G [—1,1] we have a point y b with the first k coordinates equal to b and the rest of the 
coordinates equal to b 2 . We let K2 be the convex hull of all these points. 

The hypothesis of the theorem is now equivalent to the fact that K\ and K2 are disjoint. Any hyperplane 
separating these two convex sets would be sufficient for the first part of the theorem but to make sure that 
the optimal r satisfies \r\ < 1 we need to consider how to find this hyperplane more carefully. 

Suppose P2 is a point in K% such that d(p2,Ki), i.e. , the distance from P2 to K\ , is minimal. Furthermore 
let pi be the point in K\ minimizing d{p\,p2)- One choice for the separating hyperplane is the hyperplane 
which is orthogonal to the line through p\ and P2 and which intersects this line at the midpoint between 
pi and p2- We get a corresponding quadratic form, Q, and it is not difficult to see that the maximum of Q 
over K2 is taken at P2 (and possibly at some other points). Thus if we can make sure that P2 does not equal 
(i*,i(S)) or we are done. 

We make sure that this is the case by first applying a linear transformation to the space. Note that 
applying a linear transformation does not change the property that K\ and K2 are non-intersecting convex 
bodies but it does change the identity of the points p\ and p2- 

As P does not support a uniformly positively correlated measure it does not accept either of the points 
l k or — l k as a measure concentrated on such a point is uniformly positively correlated. This implies that 
K\ is contained in the strip 



k 

£: 

i=l 



< k — 2. 



We also have that K2 is contained in the strip 



52 V* 

i=l 



< fe, 



and that it contains points with the given sum taking any value in the above interval. Furthermore the points 
we want avoid satisfy | Yli=i Ui\ = k. Now apply a linear transformation that stretches space by a large 
factor in the direction of the vector (l fc , O^ 2 )) while preserving the space in any direction orthogonal to this 

vector. It is easy to see that for a large enough stretch factor, none of the points (l fc , 1(2)) or (-l*,l(2)) can 
be the point in K2 that is closest to K\ . The theorem follows. □ 

Given Theorem l5.2l the next theorem should be no surprise. 

Theorem 6.5. Let P be a predicate whose set of accepting inputs does not support a uniformly positively 
correlated measure and Q be the quadratic function proved to exist by Theorem \6.4\ Then P is positively 
useful for Q. 



Proof. The proof is small modification of the proof of Theorem 15.2 1 and let us only outline these modifica- 
tions. 

Let r be the optimal bias of the inputs to get the best expectation of Q and let us consider the expected 
value of -jr Q{ x a!j) given that we set Xi to one with probability (1 + r + Ui)/2. This probability can 
be written a quadratic form in yi and we want to optimize this quadratic form under the conditions that 
\ r + V%\ = 1 for any i. Note that the constant term is Eq and if we introduce a new variable yo that always 
takes the value 1 we can write the resulting expectation as 



''■<) ■ 52 C 'i !l ' !l r 

i¥=3 



(3) 
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for some real coefficients c^. As before we relax this to a semi-definite program by replacing the products 
ViDj in © by inner products (vi, Vj) and relaxing the constraints to 

||r«o + v i\\ < 1) 

for any i > 1 and ||uo|| = 1. Solving this semi-definite program we are now in essentially the same situation 
as in the proof of Theorem 15.21 The fact that |r| < 1 ensures that a sufficiently large scaling of the inner 
products results in probabilities in the interval [0,1]. We omit the details. □ 

Theorem |6.3l proves that having odd parity on four variables is positively useless but assumes the UGC. 
It would seem natural that this theorem should be possible to establish based solely on NP ^ P, but we 
have been unable to do so. 

Let us briefly outline the problems encountered. The natural attempt is to try a long-code based proof for 
label cover instance similar to the proof [HasOl |. A major problem seems to be that all known such proofs 
read two bits from the same long-code. Considering functions Q that benefit from two such bits being equal 
gives us trouble through incorrect proofs where each individual long code is constant. For instance we 
currently do not know how to show that odd parity is not useful for the "exactly three" function based only 
on NP/P. 

7 Adaptive Uselessness and Pseudorandomness 

We now discuss the adaptive setting, when we allow the algorithm to choose the new objective function Q 
based on the Max-P instance. Formally, we make the following definition. 

Definition 7.1. The predicate P is adoptively useful, if and only if there is an e > such that there is 
a polynomial time algorithm which given a Max-P instance with value 1 — e finds an objective function 
Q : { — l,l} k — > [0,1] and an assignment x such that 

1 m 

-VQ(4)> E \Q(x)]+e. 

Note that we need to require Q to be bounded since otherwise the algorithm can win by simply scaling 
Q by a huge constant. Alternatively, it is easy to see that in the presence of negations adaptive usefulness is 
equivalent with requiring that the algorithm finds an assignment x such that the distribution of the fc-tuples 
{xaj}je[m] i s e "f ar i n statistical distance from uniform for some e > (not the same e as above). In fact, 
since k is constant it is even equivalent with requiring that the min-entropy is bounded away from k, in 
particular that there is some a G {— 1, l} k and e > such that at least a 2~ k + e fraction of the xZ 's attain 
the string a. 

Adaptive uselessness trivially implies non-adaptive uselessness. In the other direction, with the interpre- 
tation of avoiding uniform fc-tuples, it is easy to see that the proof of the hardness result based on pairwise 
independence from the non-adaptive setting works also for adaptive uselessness. 

This result can, by a slightly more careful argument, be extended also to the case without negations. 
The characterization is then that the algorithm is attempting to produce a distribution on A;-tuples that is far 
from being uniformly positively correlated. In this setting, it does not seem meaningful to think of adaptive 
uselessness as a pseudorandomness property. 
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8 Some New Approximation Resistance Results 



In this section we provide two new results on approximation resistance. First, we describe how the pairwise 
independence condition of IIAM091 can be relaxed somewhat to give approximation resistance for a wider 
range of predicates. Second, motivated by Theorem 15. II we show that there exist predicates P which are of 
the form sgn(Q) for a quadratic form Q. 



8.1 Relaxed Pairwise Independence Conditions 

Let us first recall one of the few known examples of a predicate that is approximation resistant but not 
hereditarily approximation resistant. 

Example 8.1. Consider the predicate GLST : {-1, l} 4 {0, 1} defined by 

nrcrrt \ f x 2 ¥" x 3 ifxi = -l 

GLST( X1 , W4 ) = 1 [ X2 _, Xi ifxi = 1 ■ 

This predicate was shown to be approximation resistant by Gurus wami et al. HGLST981 , but there is no 
pairwise independent distribution supported on its accepting assignments - indeed it is not difficult to check 
that X2X3 + X2X4 + X3X4 < for all accepting inputs. This predicate also implies NAE(x2, X3, X4), the 
not-all-equal predicate and this is known to be non-trivially approximable [ Zwi98ll . 

When the predicate GLST is proved to be approximation resistant in [GLST98 1 the crucial fact is that 
not all terms appear in the Fourier expansion of P. We have 

1 x 2 x 3 x 2 x 4 XiX 2 X 3 XlX 2 X 4 
GLST{ Xl ,x 2 , x 3 , x 4 ) = — + — — • 

The key is that no term in the expansion contains both of the variables X3 and X4, corresponding to two 
questions in the PCP that are very correlated and hence giving terms that are hard to control. 

In other words, when proving approximation resistance it suffices to only analyze those terms appearing 
in the Fourier expansion of a predicate P. In the context of the pairwise independent condition (which only 
gives UG-hardness, not NP-hardness), this means that it suffices to find a distribution which is pairwise 
independent on those pairs of variables that appear together in some term. 

However, these are not the only situations where we can deduce that P is approximation resistant. 

Example 8.2. Consider the predicate 

P(X1,X2,X 3 ,X 4 ) = GLST(xi,x 2 ,X3,x 4 ) V (xi = x 2 = x 3 = x 4 = 1), 

which is the GLST predicate with the all-ones string as an additional accepting assignment. One can 
check that there is no pairwise independent distribution supported on P~ 1 (l), and since P has an odd 
number of accepting assignments, all its Fourier coefficients are non-zero. However, Max-P is known to be 
approximation resistant HHasQ51 . 

The result of HHas051 proving that this predicate is resistant is somewhat more general. In particular, it 
says the following. 

Theorem 8.3 ( 1Has051 . Theorem 6.5). Let P : { — 1,1} 4 — > {0,1} be a predicate on 4 bits. Suppose 
P({3,4}) > and that P accepts all strings X1X2X3X4 with Y]a=x x i = ~~ 1 an d x 3 = ~ x 4- Then P is 
approximation resistant. 
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The statement of [Has05], Theorem 6.5 is slightly different. The above statement is obtained by flipping 
the sign of X3 in the statement of HHas05l . We now give a natural generalization of Theorem l8.3l to a much 
larger class of predicates (but giving UG-hardness, whereas [Has05] gives NP-hardness). We first define the 
specific kind of distributions whose existence give our hardness result. 

Definition 8.4. A distribution p, over {—1, l} k covers S C [k] if there is an i G S such that E M [xj] = and 
^ti[xiXj] = for every j E S\ {i}. 

Definition 8.5. Fix a function Q : {— l,l} fc — > R and a pair of coordinates {i,j} C [k]. We say 
that a distribution p over {— 1,1} is {i,j} -negative with respect to Q if E^fxj] = E M [x.,-] = and 
Cov n[xi,Xj]Q({i,j}) < 0. 

Our most general condition for approximation resistance (generalizing both Theorem 18 .3 1 and [AM09|) 
is as follows. 

Theorem 8.6. Let P : {—1, l} k —> {—1, 1} be a predicate and let Q : {—1, l} fc — > R be a real valued 
function. Suppose there is a probability distribution p supported on P with the following properties: 

• For each pair {i,j} Q [k], it holds that p is {i,j} -negative with respect to Q 

• For each S 7^ 0, 7^ 2 such that Q(S) 7^ 0, it holds that p covers S 

Then P is not useful for Q, assuming the Unique Games Conjecture. In particular, if the conditions are true 
for Q = P then P is approximation resistant. 

We give the proof of the theorem in Section |T0j Let us now illustrate it by applying it to the earlier 
example. 

Example 8.7 (Example [82] continued). Consider the distribution \i used to prove approximation resistance 
for GLST, i.e., the uniform distribution over the four strings xiX2X^Xi satisfying X1X2X3 = —1 and 
X3 = — X4 (note that the condition of Theorem I8.3l is precisely that P should accept these inputs). First, it 
satisfies 

j P({3,4})E[x 3 x 4 ] = ^-(-l) <0, 
lb 

and all other pairwise correlations are 0, so \i satisfies the {i, j}-negativity condition of Theorem 18.61 Fur- 
ther, for \S\ > 2 it holds that either x\ or X2 is in S. Since E^[xi] = and K^xiXj] = for all j 7^ 1 
(and similarly for X2), this shows that any \S\ > 2 is covered by p. Finally since all E^fscj] = 0, all four 
singleton S are also covered by p. Hence Theorem 18.61 implies that Max-P is resistant (under the UGC). 

Example 8.8. Consider the predicate P(x) = x\ ® ((x2 ® x%) V X4). This predicate is known to be 
approximation resistant HHas051 . Let us see how to derive this conclusion using Theorem 18.61 (albeit only 
under the UGC). The Fourier expansion of P is 

p/ x _ 1 X\ X1X4 XlX 2 X 3 X1X2X3X4 

[X) ~ 2 + T"~ 4 I ' 

and the distribution we use is uniform over: 

{ x G {— 1, l} 4 I X1X2X3 = — 1, x 4 = 1}. 

Each of xi, X2, X3 are unbiased, and X4 is completely biased but as it does not appear as a singleton in the 
expansion of P this is not an issue. Further, all pairwise correlations are 0, and it is easy to check that this is 
sufficient for Theorem [Ol to apply. 
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We have only used Theorem l8.6l to get approximation resistance in a few examples but it can also be used 
to give examples of P not being useful for Q in various situations but we leave the creation of interesting 
such examples to the reader. 

We are not aware of any approximation resistant predicates that do not satisfy the conditions given in 
Theorem l8.6l On the other hand we see no reason to believe that it is tight. 

8.2 Resistant Signs of Quadratic Forms 

Let us consider a slightly different example. Suppose that in the definition of uselessness we only considered 
predicates Q rather than arbitrary real-valued functions. Would we get the same set of useless predicates? 
The answer to this question is not obvious. Any real-valued function Q can be written in the form 

Q(x) =q + Y^c P P(x) 
P 

where the sum is over different predicates and each coefficient cp is non-negative. This implies that if P is 
useful for a real- valued function Q then there is a collection of predicates such that on any instance we can 
do better than random on one of these predicates. This does not imply that there is a single predicate for 
which P is useful but it excludes the standard proofs where the instance used to prove that P is useless for 
Q is independent of the identity of Q. 

If one would single out a candidate predicate for which P is useful the first candidate that comes to mind 
given the discussions of Section [5]is, possibly, 

P'(x) = sgn(QO)) 

where Q is the quadratic form guaranteed by Theorem 15.11 Note that it may or may not be the case that 
P = P'. This choice does not always work. 

Theorem 8.9. There is a predicate, P, of the form sgn(Q(x)) where Q is a quadratic function without a 
constant term that is approximation resistant (assuming the UGC). 

Proof. Let L\ and L 2 be two linear forms with integer coefficients which only assume odd values and only 
depends on variables X{ for i > 3. Define 

Q{x) = 10(Li(x) + xi){L 2 {x) + x 2 ) + x 1 L 2 (x) + 2x 2 L 1 (x), (4) 

and let P(x) = sgn(Q(x)). We establish the following properties of P. 

1. For all q such that {1,2} C a we have P a = 0. 

2. There is a probability distribution \i supported on strings accepted by P such that E^fxj] = for all i 

and E^XiXj] = for all i < j with (i, j) ^ (1, 2). 

These two conditions clearly makes it possible to apply Theorem 18.61 Loosely speaking the second 
condition makes it possible to construct at PCP such that we can control sums over all nontrivial characters 
except those that contain both 1 and 2. The first conditions implies that these troublesome terms do not 
appear in the expansion of P and hence we can complete the analysis. 

We claim that property 1 is equivalent to the statement that every setting of the variables x% for i > 3 
results in a function on x\ and x 2 that has the Fourier coefficient of size 2 equal to 0. In other words it 
should be a constant, one of the variables x\ or x 2 or the negation of such a variable. Let us check that this 
is the case for Q defined by ©. 

Fix any value of Xi, i > 3 and we have the following cases. 
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1. \Li\ > 3 and |Li| > 3. 

2. \Li\ = 1 and |L 2 | / 1. 

3. \L 2 \ = 1. 

In first case clearly the first term determines the sign of Q and P = sga(Li(x)L 2 (x)) and in particular 
the sub-function is independent of x\ and x 2 . 

The second case is almost equally straightforward. When x\ = L\{x) then the first term dominates 
and the answer is sgn(xiL2(x)). When x\ = —L\(x) the first term is and as \Li2(x)x\\ > 3 while 
\Li{x)x2\ = 1 the answer also in this case is sgn(xiL 2 ( K x)). 

Finally let us consider the third case. Then if X2 = L 2 (x) our function Q reduces to 

20(Li(x) + xi)x 2 + x 2 {2L 1 (x) + xi) 
and any nonzero term of this sum has sign sgn(x2-ki(x)). Finally if x 2 = — L 2 (x) we get 

x 2 (2Li(x) - xi) 

and again the sign is that of sgn(x2-^i(x)). We conclude that in each case we have one of the desired 
functions and property 1 follows. 

We establish property 2 in the case when each Li is the sum of 5 variables not occurring in the other 
linear form. Thus for example we might take 

Li(x) = x 3 + x 4 + x 5 + x 6 + X7 

and 

L 2 (x) = X 8 + X 9 + Xio + Xn + Xl 2 . 

We describe the distribution ji in a rather indirect way to later be able to analyze it. Let c = 7 ~^"" m .0746. 

1. Fix \Li (x) | to 1,3 or 5 with probabilities \ + 2c, \ — 3c, and c, respectively. 

2. Fix 1^2(^)1 to 1 or 3 each with a probability \. 

3. Pick a random b £ { — 1,1} taking each value with probability \. 

4. Suppose \L\ (x)| > 3 and |£ 2 (x)| > 3. Set sgn(Li(x)) = sgn(L2(x)) = b and xi = X2 = —b. 

5. Suppose \L 2 \ / 1 and \L\\ = 1. Set sgn(Li(x)) = — sgn(L2(x)) = b and x\ = x 2 = —b. 

6. Suppose \L 2 \ = 1. Set sgn(Li(x)) = x\ = x 2 = b and sgn(L2(x))) = —b with probability 1+ 2 12c 
and sgn(L2(x)) = b with probability 1- 2 12c . 

7. Choose Xi for i > 3 uniformly at random given the values L\(x) and L 2 (x). 

Now first note that by the analysis in establishing property 1 we always pick an assignment such that 
Q{x) > 0. This follows as in the three cases the output of the function is sgn(Li(x)L2(x)), sgn(xiL2(x)), 
and sgn(Li(x)x2), respectively and they are all chosen to be b 2 . 

As b is a random bit, it is easy to see that E[xj] = for any i and we need to analyze E[xjXj] for 
(i, j) / (1,2). In our distribution we always have X\ = x 2 and the variables in L\ and L 2 are treated 
symmetrically and hence it is sufficient to establish the following five facts. 
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1. E[L 2 (x)] = 5. 

2. E[Ll(x)] = 5. 

3. E[xiLi(x)] = 0. 

4. E[xiL 2 (aO] = 0. 

5. E[Li(x)L 2 (x)] = 0. 
The first expected value equals 



while the second equals 



— + 2r ) • I + ( — — 3c ) - 9 + 2o • c 5 



1-1 + 1-9 



2 

For the third expected value note that x±Li(x) = —\Li(x)\ when L 2 (x) = 3 while it equals x\L\{x) = 
\L\(x)\ when L 2 (x) = 1. The two cases happens each with probability 1/2 and as |Li(x)| is independent 
of | L 2 (x) I the equality follows. 

To analyze the fourth value first observe that conditioned on |X»2(x)| = 1 we have E[xiL2(x)] = —12c. 
On the other hand when |L2(x)| = 3 we have 

E[ Xl L 2 (x)] = 3(i + 2c) - 3(~ - 3c) - 3c = 12c, 
giving the result in this case. Finally, conditioned on |L 2 (x)| = 1 we have 

E[L 1 (x)L 2 (x)} = -12c ((- + 2c) + 3(~ - 3c) + 5cj = -(24c - 24c 2 ) 
and conditioned on |L2(x)| =3 we have 

E[Li(x)L 2 (x)] = -3(- + 2c) + 9(- - 3c) + 15c = 3 - 18c 



giving a total expected value of 
and c was chosen carefully to make this quantity 0. □ 



3 + 24c 2 - 42c 



9 One Result at the Other End of the Spectrum 

We have focused on computationally useless predicates that do not enable us to do essentially anything. 
Knowing that there is an assignment that satisfies almost all the conditions does not enable us to do better 
for any function. 

At the other end of the spectrum we could hope for predicates where even more moderate promises can 
be sufficient to find useful assignments efficiently. 

One possibility is to ask for a predicate that is useful for all functions Q. This is too much to ask for, 
as discussed in Section |3l if P and Q are sufficiently unrelated it might be the case that there are instances 
where we can satisfy P on all constraints while the best assignment when we consider condition Q only 
satisfies essentially a fraction Eq. One possible definition is to say that P should be useful for any Q which 
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is not excluded by this information theoretic argument. This is a potential avenue of research which we have 
not explored and hence we have no strong feeling about what to expect. One complication here is of course 
that the characterization of Theorem I3.2l is not very explicit and hence might be difficult to work with. 

The Q we must always consider is the traditional question of appro ximability namely Q = P but let us 
weaken the promise from the optimum being almost one to being just slightly above the random threshold. 

Definition 9.1. A predicate P is fully approximable if for any e > there is a 5 > such that if the 
optimal value of a Max-P instance is Ep + e then one can efficiently find an assignment that satisfies an 
(Ep + <5)-fraction of the constraints. 

First note that the most famous example of a fully approximable predicate is Max-Cut and in fact any 
predicate of arity two is fully approximable. This definition has been explored previously in [HasOTl but 
given that this is not a standard venue for results on Max-CSPs let us restate the theorem which that any 
fully approximable predicate is in fact a real valued sum of predicates of arity two. 

Theorem 9.2. [Has07] A predicate P is fully approximable if and only if the Fourier expansion of P 
contains no term of degree at least 3. 

We refer to [Has07 1 for the not too difficult proof. It is an amusing exercise to find the complete list of 
such predicates. A predicate on three variables is fully approximable iff it accepts equally many even and 
odd strings. Up to negations and permutations of variables, the only predicate that depends genuinely on 
four variables with this property is 

2 + xix 3 + X1X4 + X2X3 - X2X4 



10 Proofs of UG-Hardness 



In this section we give the proofs of the extensions Theorems 16.31 and l8.6l of [AM09]. It is well-known that 
the key part in deriving UG-hardness for a CSP is to design dictatorship tests with appropriate properties — 
see e.g. [Rag08] for details. 



10.1 Background: Polynomials, Quasirandomness and Invariance 



To set up the dictatorship test we need to mention some background material. 

For b G [—1, 1], we use {—1, l}™ b j to denote the n-dimensional Boolean hypercube with the 6-biased 
product distribution, i.e., if x is a sample from {—1, then the expectation of i'th coordinate is E[x$] = b 
(equivalently, Xj = 1 with probability (1 + b)/2), independently for each i G [n]). Whenever we have a 
function / : {—1, 1}^ — > M. we think of it as a random variable and hence expressions like E[/], Var[/], 
etc, are interpreted as being with respect to the 6-biased distribution. We equip L 2 ({— 1, with the 
inner product (/, g) = E[/ • g] for /, g : {-1, -> R. 

For S C [n] define X s ■ {-1, ^ M by 



Xs(x) = Y[x{xi), 



ie5 



where x : {-1,1} 



(b) 



is defined by 



X(xi) 



ar \Xi 
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The functions {xs}sc[ n ] form an orthononnal basis with respect to the inner product (-, •) onL 2 ({ — 1, 
and thus any function / : {—1, 1}^ — > R can be written as 



/(*) = E f(S;b) Xs > 



x 



SC[n] 



where f(S; b) are the Fourier coefficients of / (with respect to the fe-biased distribution). 

With this view in mind it is convenient to think of functions / in L 2 ({— 1, as multilinear polyno- 



mials F : 



in the random variables = x( x i)i vrz -> 

F(X)= £ /(5;6)n^<- 



SC[nl 



We say that such a polynomial is (d, r)-quasirandom if for every % G [n] it holds that 

i£SC[n] 
\S\<d 

A function / : { — 1, 1}^ — >■ Ris said to a be a dictator if /(x) = Xj for some i G [n], i.e., G simply returns 

the z'th coordinate. The polynomial F corresponding to a dictator is F(X) = b + y/l — b 2 Xi. Note that 
a dictator is in some sense the extreme opposite of a (d, T)-quasirandom function as a dictator is not even 
(1, r)-quasirandom for any r < 1. 

We are interested in distributions \i over {—l,l} k . In a typical situation we pick n independent samples 
of n, resulting in k strings x±,. . . ,Xk of length n, and to each such string we apply some function / : 
{— 1, l} n — > {—1, 1}. With this in mind, define the following k x n matrix X of random variables. The 
j'th column which we denote by X^ has the distribution obtained by picking a sample x G {—1, l} fc from 
[i and letting X\ = x( x i)> independently for each j 6 [n]. Then, the distribution of (/(a?i), . . . , /(x^)) is 
the same as the distribution of (F(X\), . . . , F{Xk)), where Xj denotes the z'th row of X. 

Now, we are ready to state the version of the invariance principle IMO OlOllMoslOB that we need. 

Theorem 10.1. For any a > 0, e > 0, b G [—1, 1], G N there are d,T > rac/j ?/2<3f the following holds. 
Let \x he any distribution over { — 1, l} fc satisfying: 

1. E-j^fxj] = b for every i G [A;] ('i.e., all biases are identical). 

2. fj,(x) > afar every x G {—1, l} fc (?.e., /i has full support). 

Let X be the kxn matrix defined above, and letY be a kxn matrix of standard jointly Gaussian variables 
with the same covariances as X. Then, for any (d, r)-quasirandom multilinear polynomial F : W 1 — > R, it 
holds that 



E 



E 



< e. 



U=l 



10.2 The Dictatorship Test 

The dictatorship tests we use to prove Theorems 16.31 and 18.61 are both instantiations of the test used in 
[AM09], with slightly different analyses, so we start by recalling how this test works. 

In what follows we extend the domain of our predicate P : {—1, l} fc — > {0, 1} to [—1, l] fe multi-linearly. 
Thus, we have P : [-1, l] fe -»■ [0, 1]. 
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Input: A function / : {-1, 1} L — > [— 1, 1] 
Output: Accept/Reject 

1. Let fi e = (1 — e)/i + ellb, where Ub denotes the product distribution over { — 1, l} fc where each 
bit has bias b. 

2. Pick L independent samples from p e , giving k vectors x\, . . . ,x*k £ { — 1, 

3. Accept with probability P{f{x\), . . . , 

Figure 1: Dictatorship test 

To prove hardness for Max-P, one analyzes the performance of the dictatorship test in Figure [T] which 
uses a distribution \i over {-l,l} fc that we assume is supported on P 1 (1) and satisfies condition (1) of 
Theorem llO.il which is the case in both Theorems 16.31 and 1 8.61 

The completeness property of the test is easy to establish (and only depends on /x being supported on 
strings accepted by P). 

Lemma 10.2. If f is a dictatorship function then A accepts with probability > 1 — e. 

For the soundness analysis, the arguments are going to be slightly different for the two Theorems 16.31 
and !8.6l It is convenient to view / in its multilinear form as described in the previous section. Thus, instead 
of looking at f(x±), . . . , f(xk) we look at F(X\), . . . , F{Xk). In both cases, the goal is to prove that there 
are d and r such that if F is (d, r)-quasirandom then the expectation of Q(F(Xi), . . . , P(Xfc)) is small (at 
most Eq + e for Theorem I8.6l and at most Eq + e for Theorem I6.31 l. 

In general, it is also convenient to apply the additional guarantee that F is balanced (i.e., satisfying 
E[F{X)\ = 0). This can be achieved by the well-known trick of folding, and is precisely what causes the 
resulting Max-P instances to have negated literals. In other words, when we prove Theorem 16.31 on the 
hardness of Max-P" 1 ", we are not going to be able to assume this. 

Theorem 18.6b Relaxed Approximation Resistance Conditions The precise soundness property for The- 
orem [8T6]is as follows. 

Lemma 10.3. Suppose /j is a distribution as in the statement of Theorem 18.61 and that the function F is 
folded. Then for every e > there are d, r such that whenever F is (d, r)-quasirandom then 

m L Q{F{X 1 ),...,F{X k ))\<E Q + e. 

Note that in this case, the distribution \i € is unbiased, in which case the distribution of each column of 
X is simply the distribution p e itself. 

Proof. We write Q(x) = ]Csc[fc] Q(S) Yii^s x *> wnere Q(S) are the Fourier coefficients of Q with respect 
to the uniform distribution, and Q{$) = Eq. 

We analyze the expectation of Q term by term. Fix a set ^ S C [k] and let us analyze E[fT ig £ F(Xi)]. 
Let d, t be the values given by Theorem ll0.ll when applied with e chosen as e/2 h and the a given by the 
distribution /j e (note that this distribution satisfies the conditions of Theorem llO.il ). There are two cases. 

Case 1: \S\ = 2 Let S = {i,j}. The conditions on p, guarantee that \i is {i,j} -negative with respect to Q, 
i.e., for any column a we have E[Xf } = E[X<f] = and Q(S) E[Xfxf] < 0. Let p = E[Xf Xf] (as 
each column a is identically distributed this value does not depend on a). Then we have 

Q(5)E[F(Xi)F(X,-)] = Q(S)§ P (F) 
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where S P (F) is the noise stability of F at p. Moreover, since the function F is folded it is an odd 
function, which in particular implies that S P (F) is odd as well so that sgn(S p (F)) = sgn(p). Since 
Q(S)p < it follows that Q(S) S P (F) < as well, so S can not even give a positive contribution to 
the acceptance probability. 

Case 2: \S\ ^ 2, Q(S) ^ This is the more interesting case. The conditions on p guarantee that p covers 
S, i.e., there is an i* £ S such that E[X£] = and E[X£Xf] = for all j e S \ {i*}. By 
Theorem llO.il we know that if F is (d, r)-quasirandom we have 



E 



ieS 



E 



ieS 



< e/2 k , 



where Y is a jointly Gaussian matrix with the same first and second moments as X. But then, for every 
column a, the conditions on the second moments imply that Y-i is a standard Gaussian completely 
independent from all other entries of Y. This implies that 



E 



ieS 



E[F(Yi*)]-E 



n nYi) 

ies\{i*} 



where the second equality is by the assumption that F is folded. This implies that S has at most a 
negligible contribution of e/ 2 k to the acceptance probability of the test. 

□ 



Theorem 16.3b No Negations As mentioned earlier, in the case when negated literals are not allowed, we 
can no longer assume that F is folded. Furthermore, the distribution p over {— 1, l} fc used is only assumed 
to be pairwise uniformly correlated. The precise soundness is as follows. 

Lemma 10.4. Suppose p is a uniformly positively correlated distribution. Then for every Q : [— 1, l] k — > 
[—1, 1] and e > there are d, r such that whenever F is (d, r)-quasirandom then 

E[Q(F(X 1 ),...,Q(X k ))]<E+ + e. 



Proof. Similarly to the previous lemma, we are going to take d, r to be the values given by Theorem 110.11 
with e chosen as 

Note that since p e is a combination of p and E7&, both being uniformly positively correlated, p e is also 
uniformly positively correlated. 

Let b = E xr ^ Pe [xi] and p = E a; ^ jUe [xiXj] > b 2 be the bias and correlation of p e , respectively. Define a 
new distribution rj over {— 1, l} fc as 



V = ° U y/p + C 1 - C ) U ~y/pi 



where c = € [0, 1] (recall that U ^ denotes the product distribution where all biases are yfp). 

Then rj has the same first and second moments as p e and therefore, writing Z for the corresponding 
matrix from r/, we can apply Theorem 1 1 0. 1 1 twice and see that for every S C [k] 



E 



E 



< e/2 k 



Lies 



Lies 



21 



implying 

\M[Q(F(X 1 ),...,F(X k ))]-E[Q(F(Z 1 ),...,F(Z k ))]\<e. 

Now, the column Z l of Z can be written as a convex combination of two product distributions R + and 
R~ over R fc (resulting from applying the character x to U rp and U-rp, respectively). By linearity of expec- 
tation, we can replace Z 1 with one of R + and R~ without decreasing the expectation of Q(F(-), . . . , F(-)). 
Repeating this for all columns, we end up with a random matrix W, each column of which is either dis- 
tributed like R + or like R~ , and satisfying 

E[Q(F(Wx), F(W k ))} > E[Q(F(Zi), . . . , F(Z k ))]. 

But now since each column of W is distributed according to a product distribution (with identical marginals), 
the rows of W are independent and identically distributed, implying that 

E[Q(F(W 1 ),...,F(W k )))<E+. 

Combining all our inequalities, we end up with 

E[Q(F(X 1 ),...,F(X k ))]<E+ + e, 

as desired. □ 



11 Concluding Remarks 

We have introduced a notion of (computational) uselessness of constraint satisfaction problems, and showed 
that, assuming the unique games conjecture, this notion admits a very clean and nice characterization. This 
is in contrast to the related and more well-studied notion of approximation resistance, where the indications 
are that a characterization, if there is a reasonable one, should be more complicated. 

Our inability to obtain any non-trivial NP-hardness results for positive uselessness, instead of Unique 
Games-based hardness is frustrating. While HHasOU proves odd parity of four variables to be positively 
approximation resistant, obtaining positive uselessness by the same method appears challenging. 

Another direction of future research is understanding uselessness in the completely satisfiable case. 

We have focused on CSPs defined by a single predicate P (with or without negated literals). It would 
be interesting to generalize the notion of usefulness to a general CSP (defined by a family of predicates). 
Indeed, it is not even clear what the correct definition is in this setting, and we leave this as a potential avenue 
for future work. Another possible direction is to consider an analogous notion for the decision version of a 
CSP rather than the optimization version. 

Acknowledgement. We are grateful to a number of anonymous referees for useful comments on the pre- 
sentation of this paper. 
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